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Abstract 

The  goal  of  knowledge  delivery  research  is  to  create  a  technology  of  authorship 
by  computer.  Existing  technology  is  all  in  the  laboratory  stage,  and  is  limited  to  very 
small,  rigidly  constrained  texts.  This  research  project  has  focused  on  two  kinds  of 
developments  intended  to  overcome  these  limits:  1)  expanding  the  notation  and 
practices  of  knowledge  representation  so  that  a  wider  range  of  knowledge  can  be 
rendered  in  natural  language,  and  2)  creating  a  theory  of  text  structure  that  is  suitable 
as  a  basis  for  writing  programs  that  design  texts. 

This  is  the  final  research  report  for  AFOSR  contract  FQ8671-84-01007,  part  of 
an  ongoing  research  program  at  USC  Information  Sciences  Institute.  The  research  is 
not  complete,  and  is  being  continued  under  contract  F49620-87-C-0005.  This  report 
represents  the  research  accomplishments  in  the  interval  of  August  15,  1984  to  August 
14,  1986. 

1  Goals 

The  general  goal  of  Knowledge  Delivery  Research  is  to  create  a  technology  of 
authorship  by  computer,  so  that  computers  can  represent  their  knowledge  freely  in 
written  English.  Since  existing  technology  is  limited  to  small  texts,  a  major  subgoal  is 
to  create  reusable,  size-insensitive  methods  for  programs  to  use  in  creating  texts.  These 
goals  are  being  pursued  using  the  methods  of  Artificial  Intelligence,  with  heavy  input 
from  Linguistics. 


2  Accomplishments 
2.1  Knowledge  Notation 

The  English  language,  and  much  of  human  knowledge,  is  organized  around 
concepts  of  actions  and  their  participants.  Terms  such  as  "creator",  "recipient"  and 
"owner"  are  not  really  definable  apart  from  some  sort  of  action  orientation.  In 
contrast,  the  knowledge  notations  of  AI  and  mathematics  do  not  give  any  special 
organizational  place  to  actions  or  their  participants,  and  there  are  no  strong  precedents 
for  representing  such  knowledge. 

This  creates  several  difficulties  for  expert  systems  and  for  English  language 
knowledge  delivery: 
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1.  Information  about  actions  tends  to  get  represented  inconsistently,  with  the 
result  that  uniform  information  processing  methods  yield  inconsistent 
results. 

2.  Systematic  expression  in  English  of  knowledge  about  actions  is  made 
difficult. 

3.  Computer  processes  which  interpret  action-oriented  English  tend  to  lose 
information. 

In  regularizing  knowledge,  the  uniformity  can  come  either  from  a  restrictive 
notation  or  from  uniform  conventions  of  use  of  a  less  restrictive  notation.  The  general 
knowledge  notations  of  AI  and  mathematics  are  not  easily  converted  into  restrictive 
notations  that  enforce  uniform  or  canonical  representation  of  actions.  To  do  so  would 
dilute  some  of  their  carefully  developed  advantages.  We  are  therefore  developing 
conventions  of  use  of  existing  notations  for  representing  knowledge  of  actions. 

One  of  the  basic  representational  strategies  of  AI  is  to  use  a  taxonomy  of  concept 
types.  Commonly  these  are  organized  around  the  entities  of  the  domain  of  processing, 
such  as  chemical  reactions  or  diagnosis  of  electronic  equipment.  Generic  concepts,  such 
as  action,  are  seldom  used.  Such  concepts  are  typically  not  in  conflict  with  the  domain, 
but  in  early  design  they  are  often  considered  superfluous. 

We  are  building  a  highly  abstract  taxonomy,  including  action  concepts;  it  is 
intended  as  a  base  upon  which  more  detailed  (domain)  concepts  can  be  added.  This 
taxonomy,  called  Upper  Structure,  provides  for  a  wide  range  of  the  organizing  concepts 
of  English,  not  just  action  and  participants.  It  is  in  NIKL,  a  well  known  AI  knowledge 
notation  [Schmolze  &  Brachman  82].  In  NIKL,  concepts  have  not  only  positions  in  a 
hierarchy,  but  also  roles  which  related  them  to  other  concepts.  NIKL  provides 
inheritance  of  roles  in  the  hierarchy.  Figure  1  shows  the  Upper  Structure  taxonomy  of 
concepts,  without  roles,  and  Figure  2  shows  action  with  related  concepts. 

Finding  an  appropriate  hierarchic  representation  is  made  difficult  both  by  the 
interactions  of  representational  choices  and  by  the  high  level  of  generality  desired.  The 
version  shown  has  been  used  in  an  English  generation  program,  Penman,  which  has 
been  tested  with  several  knowledge  domains,  including  computer  mail  and  calendars, 
assistance  to  Penman  users,  and  a  weapons  data  base. 

2.2  Text  Organization 

Much  of  the  knowledge  in  modern  computer  programs  is  too  complex  to  express 
in  single  sentences,  making  it  necessary  to  organize  larger  texts  for  knowledge  delivery. 
Unfortunately  there  is  no  well  established  technology  of  text  organization  at  a  level  of 
detail  that  could  guide  computer  programming. 
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Figure  1:  Hierarchy  of  Upper  Structure  Categories 


We  are  therefore  working  to  create  a  basic  theoretical  knowledge  of  text 
organization.  We  have  named  one  major  portion  of  this  Rhetorical  Structure  Theory 
(RST).  RST  now  provides  a  computationally  useful  account  for 

1.  What  the  functional  parts  of  a  text  are, 

2.  How  the  parts  are  related, 

3.  What  the  predictable  consequences  of  putting  parts  adjacent  are, 

4.  How  the  order  of  parts  is  determined, 

5.  Why  the  writer  includes  particular  parts  in  the  text. 

Recent  work  has  put  RST  on  a  new  definitional  foundation  [Mann  &  Thompson 
87].  The  benefits  include  these: 

1.  The  relationships  between  semantic  and  communication  constraints  on  texts 
are  identified,  with  the  two  types  coresident  in  definitions  in  same 
terminology;  previous  methods  left  them  unrelated. 


Figure  2:  Action  and  Related  Concepts 

2.  Direct  clausal  assertion,  as  in  "John  broke  the  bottle",  is  related  to  implicit 
assertion,  as  in  the  pair  "John  dropped  the  bottle.  It  broke."  ,  which  asserts 
cause.  Previous  frameworks  left  them  unrelated. 

3.  Enumerates  varieties  of  implicit  assertions  in  texts,  potentially  leading  to 
program  control  of  implicit  assertion  in  multisentence  text. 

4.  Identifies  the  theoretical  source  of  implicit  assertions,  and  shows  how  they 
can  be  derived  from  related  knowledge.  (In  our  previous  work  they  were 
arbitrary  correlates  of  text  structure.) 

5.  Identifies  a  limited  set  of  points  of  subjectivity  in  text  analysis,  and  shows 
that  the  corresponding  subjectivity  is  not  necessary  in  text  synthesis. 

Recent  work  has  created  a  RST  Structure  Construction  Procedure,  a  procedural 
descriptive  framework  which  can  be  refined  into  programmable  text  structure  planners 
for  particular  knowledge  domains  and  tasks  [Ford  &  Mann  86a],  [Ford  &  Mann  86b]. 
This  construction  procedure  is  step  2  in  the  larger  picture  presented  in  Figure  3. 

There  is  a  companion  plan,  a  utilization  procedure  which  shows  how  the  planner 
can  be  applied  to  particular  domains.  As  part  of  developing  text  construction 
techniques,  this  utilization  procedure  is  currently  undergoing  its  first  extensive  test. 


3  Remaining  Goals 

Corresponding  to  the  two  areas  of  accomplishment  described  above,  there  are 
two  areas  of  continuing  need  in  text  generation  research:  knowledge  representation  and 
text  design. 

In  the  knowledge  representation  area  we  have  identified  a  collection  of  about  10 
critical  limits  on  present  representational  notations  and  techniques.  Of  these  we  are 
actively  addressing  time,  actions  and  participants  and  propositional  relations,  which  are 
all  crucial  both  because  they  are  prominent  in  English  syntax  (as  tense,  transitivity  and 
conjunctive  relations)  and  also  because  they  are  prominent  in  representing  the  sorts  of 
computer  operations  that  arise  in  AI  expert  systems. 

Our  approach  to  these  involves  a  combination  of  development  of  effective 
notational  conventions  and  development  of  the  notations  themselves.  Experiments  are 
under  way,  but  not  yet  ready  for  evaluation.  We  plan  to  address  others  when  these 
three  are  relatively  well  developed. 

In  the  text  design  area,  a  good  start  has  been  made  in  automatic  design  of  text 
structure,  through  constructive  RST.  Part  of  the  progress  has  come  in  factoring  the 
text  design  problem  into  parts,  and  separating  structure  building  processes  from 
processes  that  work  with  other  characteristics  of  the  text  being  designed.  Figure  3 
shows  our  current  factoring. 

The  processes  other  than  structure  building  are  in  a  much  more  rudimentary 
state  of  development.  For  most  of  them,  trivialized  approaches  exist  which  can  be  used 
to  generate  basic  texts,  but  the  issues  must  be  investigated  in  order  to  develop  high 
fluency  in  generation. 


4  Technical  Summary 

Substantial  progress  has  been  made  in  creating  a  technology  with  which 
computers  can  represent  their  knowledge  freely  in  written  English.  The  methods  are 
being  made  reusable  and  compatible  with  English  partly  by  encoding  generic  knowledge 
in  an  Al-style  Upper  Structure. 

Methods  for  designing  texts  are  being  developed  based  on  Rhetorical  Structure 
Theory,  which  provides  extensive  information  about  the  available  alternatives  of  text 
organization  and  the  effects  of  using  them. 

Experiments  on  the  representation  of  time,  actions  and  participants  and 
propositional  relations  have  been  begun  but  not  completed.  Similarly,  experiments  in 
automatic  structure  building  have  been  begun.  These  need  to  be  carried  into  the 
evaluation  and  process  improvement  stages.  Additional  work  is  needed  in  several  other 
quality-limiting  areas  of  knowledge  representation  which  have  already  been  identified, 
and  in  non-structural  aspects  of  automatic  text  design. 
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1.  Before  building  RST  structure: 


a.  General  decisions  about  what  to  accomplish,  what  knowledge  to  use. 
This  yields  a  body  of  material  to  convey. 


b.  Identification  of  the  audience. 


2.  RST  structure  building,  including 

a.  Organizing  the  given  body  of  material. 


b.  Supplementing  it  as  needed,  with  evidence,  concessives, 
circumstantials,  antithesis,  contrast  and  other  supporting  material. 


c.  Ordering  nuclei  and  satellites. 


3.  After  building  RST  structure: 
a.  Theme  control, 


b.  Sentence  scope, 


c.  Conjunction  uses, 


d.  Lexical  choice, 


e.  Formulaic  text,  e.g.  "Sincerely  yours,", 


f.  Grammatical  realization. 


Figure  3:  RST  Structure  Construction  and  Related  Processes 


5  Publications 


The  publications  for  the  contract  period,  together  with  the  submitted 
publications  and  reports  which  are  currently  beyond  first  draft  form,  and  the  major 
public  presentations  of  the  project  work,  are  as  follows: 


1.  [Mann  &  Thompson  85]  —  "Assertions  from  Discourse  Structure."  It  turns 
out  that  RST  relations  have  assertional  properties,  in  which  they  convey 
information  from  the  discourse  structure,  distinct  from  the  clausal  assertions 
of  the  text.  In  [Mann  &  Thompson  86]  this  phenomenon  is  identified,  and  in 
[Mann  &  Thompson  85]  the  link  between  RST  and  the  phenomenon  is 
made. 
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2.  [Mann  86a]  --  The  1986  development  status  of  RST  is  described  in  this 
project  report,  which  will  be  part  of  a  1987  book  based  on  the  Third 
International  Workshop  on  Text  Generation  [Kempen  86]. 

3.  [Thompson  &  Mann  87]  —  "Clause  Combining  —  Antithesis  Relations  at 
Large  Scale  and  Clause  Scale."  In  this  paper,  clausal-level  and  large-scale 
antithesis  relations  are  found  to  rest  on  the  same  discourse  configurations. 
Significant  in  unifying  the  planning  methods  of  large  and  small  scales. 
Accepted  for  inclusion  in  an  edited  book. 

4.  [Matthiessen  &  Thompson  86]  —  "Clause  Combining  —  Hypotaxis, 
Embedding  and  "Subordination".  In  this  paper,  RST  is  used  to  establish 
that  so-called  subordination  is  a  composite  category  consisting  of  two 
different  phenomena.  A  project  report  not  yet  printed,  also  accepted  for 
publication  in  an  edited  book  in  1987. 

5.  [Matthiessen  87]  —  The  limitations  of  systemic  notation  and  the  phenomena 
which  are  thereby  made  hard  to  represent  are  explored.  A  conference  paper 
already  included  in  an  announced  book,  forthcoming  as  a  project  report. 

6.  [Kasper  87]  --  The  benefits  of  representing  systemic  grammars  in  functional 
unification  form  are  explored,  along  with  problems  and  solutions.  A 
conference  paper  already  included  in  an  announced  book,  forthcoming  as  a 
project  report. 

7.  [Cumming  86]  —  The  lexicon  is  a  crucial  and  distictive  part  of  generation 
technology.  This  paper  compares  the  lexical  technologies  of  a  wide  variety 
of  existing  generation  systems.  To  appear  in  an  edited  book  based  on  the 
workshop.  Currently  being  printed  as  a  project  report. 

8.  [Sondheimer  86]  --  This  paper  describes  methods  for  linking  the  Nigel 
grammar  to  the  Upper  Structure  abstractions  hierarchy,  using  a  first-order 
predicate  calculus  input  language  for  the  text  generator,  and  the  KL-TWO 
knowledge  notation  as  an  intermediary. 

9.  [Mann  86b]  —  This  paper  compares  the  text  structuring  methods  of  RST 
and  McKeown’s  TEXT  system.  It  was  presented  as  an  invited  paper  at 
COLING86,  and  has  been  accepted  to  appear  in  an  edited  book  in  1987. 

10.  [Mann  &  Thompson  87]  --  This  large  paper  features  the  new  definitional 
basis  of  RST.  It  is  scheduled  to  appear  in  an  edited  book. 

11.  [Ford  &  Mann  86a]  --  This  paper,  related  to  the  companion  paper  below, 
describes  the  current  design  of  a  process  of  text  structure  creation  using 
RST.  To  appear  as  a  project  report. 


12.  [Ford  &  Mann  86b]  —  The  process  of  text  structure  creation  has  been  tested 
in  a  variety  of  ways.  The  tests,  together  with  some  refinements  which  they 
suggested,  are  described  in  this  paper.  To  appear  as  a  project  report. 

13.  [Poulton  86]  —  Nigel  is  a  large  systemic  grammar  of  English  represented  in 
a  computer  program  for  sentence  generation.  This  report  is  documentation 
for  the  program  user.  To  appear  as  a  project  report. 

14.  [Matthiessen  86]  —  Nigel  generates  sentences  relative  to  an  environment  of 
knowledge  of  the  world  and  the  text  plan.  This  paper  describes  the 
organization  of  that  environment.  To  appear  as  a  project  report. 

6  Outside  Research  Based  on  the  Knowledge  Delivery  Project 

We  are  aware  of  the  following  outside  research  based  on  this  research  project 
which  has  been  published  or  begun: 

1.  [Fox  84]  —  "Predicting  Anaphora."  In  this  completed  Phd  dissertation, 

RST  is  used  in  predicting  when  anaphora  will  be  used. 

2.  [Noel  86]  —  The  methods  of  RST  analysis  have  been  combined  with 
complementary  methods  in  a  linguistic  characterization  of  the  presentational 
methods  of  the  BBC  World  News  Service  broadcasts.  A  completed  and 
published  dissertation,  not  PhD. 

3.  [Cui  85]  Contrastive  Rhetoric  —  Comparing  Structures  of  Essays  in  Chinese 
and  English.  This  is  a  completed  MA  dissertation,  based  on  RST. 

4.  Clause  Combining  —  Switch  Reference  in  Quechua.  This  is  a  PhD 
dissertation  in  progress  at  UCLA,  based  on  RST. 

In  addition  to  these  identified  presentations,  there  were  presentations  by  Dr. 
Mann  and  Dr.  Sondheimer  at  the  most  recent  AAAI,  ACL,  COLING,  FJCC,  German  AI 
Conference  and  (January  1987)  TINLAP  conferences  which  represented  this  research  in 
substantial  ways  but  were  not  confined  to  it. 

7  Project  Personnel 

The  technical  staff  members  of  this  project  were:1 

William  C.  Mann  Principal  Investigator 

*This  project  has  benefitted  by  interaction  with  another  ongoing  project  on  text  generation  sponsored 
by  DARPA  under  contract  MDA903  81  C  0335.  The  list  of  project  personnel  is  unusually  long  because 
they  have  all  participated  in  both  efforts. 


Co-proj  ect-leader 


Norman  K.  Sondheimer 
Robert  Albano 
Susanna  Cumming 
Cecilia  Ford 
Robert  Kasper 
Shari  Naberschnig 
Sandra  A.  Thompson 
Richard  Whitney 

Support  staff  were: 

Pam  Andro 
Lisa  Trentham 


Thomas  Galloway 
Christian  M.  I.  M.  Matthiessen 
Lynn  Poulton 
George  Vamos 


Heidi  Julian 
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